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ABSTRACT 



Intended for writers of instructional materials for teaching 
English as a Second Language (ESL) , the list describes word lists and 
language corpora that may be of use in creating, simplifying, and refining 
vocabulary content in ESL materials. The sources are dated from 1944 to the 
present, and include a freeware computer program. Some limited comments are 
made about the utility of the lists/corpora. A brief list of related World 
Wide Web sites and six print references are provided. A resource containing a 
discussion of vocabulary instruction is also noted. (MSE) 
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Vocabulary Resources for Material Writers 

From The Materials Writers Newsletter 

The Newsletter of the Materials Writers' National Special Interest Group 
of the Japan Association of Language Teachers 
Vol. IV, No. 3, October 1996 

John Bauman 
Temple University Japan 

Material written for ESL students needs to use somewhat simplified 
vocabulary and structure if it is to be accessible to lower and intermediate 
level students. In terms of vocabulary, a writer can try to "keep it simple" 
while writing, but a more rigorous approach is to compare a text with a list 
of words prepared for this purpose. A variety of lists of words are 
available, as well as different ways to use them. In this article, I will 
briefly list and describe some lists. I'll also discuss a program that will 
analyze a text and give some links for further exploration of this topic on 
the internet. URLs are given in the "Web Links" section following this 
article. 

Teaching and Learning Vocabulary (Nation 1990) contains a good 
general discussion of this topic. Nation doesn't hesitate to quantify the issue. 

His model of an ideal vocabulary teaching sequence starts with the most 
frequent 2,000 words, which he calls general service vocabulary. Everybody 
needs to know these words; they make up about 87% of an average 
written text. After this point, general frequency becomes less useful as a 
guide to what words to teach. Students are better off studying a list of 
words specific to their field of interest or need, if one can be found. For 
the student aiming at English-language higher education, Nation's 800 word 
University Word List is appropriate. After this, the remaining vocabulary of 
English is of too little frequency to merit direct study. Skills such as 
analyzing word parts, context guessing, etc. can be taught. 

The number of different words used will depend on the level of the text. Writers 
of material for ESL learners also have to decide which words to use, or, in a 
larger sense, to which population of words should they restrict themselves. Here 
a list becomes necessary. Many have been developed over the years. The 
following remain relevant. 

The General Service List 

The General Service List (GSL)(West 1953) is the specific list of 2,000 
words that Nation refers to when he writes about the "first 2,000 words." 

It's based on written texts, it's old, and it's not in frequency order, though 
frequency numbers are given. The source of the frequency information is even 
earlier than the publication date, being derived from Thorndike and Lorge (1944). 
But the list was not compiled based on frequency alone. It was created to be 
an ideal vocabulary for ESL students to start out with. Through the 1970s, a 
lot of material, particularly graded readers, was based on this list. Even today, 
much of this material is sold and used. The GSL is out of print, and somewhat 
out of favor. The list is available as a component of the Vocabprofile program 
described below and, in a slightly different form, on my web page. 
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Thorndike and Lorge 

The Teacher's Word Book of 30,000 Words (Thorndike and Lorge, 1944) was 
created as a resource for elementary and high school teachers in the United 
States. It is still frequently cited, though computer-produced corpora have largely 
replaced it as an authority on the frequency of words. For example, it's the 
source of the words above the 2,000 word level in the vocabulary test in 
Nation (1990). It's old, it's based on a compilation of pre-WW2, non-computerized 
word counts totaling about 18 million written words. As published, it's not in 
frequency order, but frequency ranks are given for each word. 

The University Word List 

The University Word List (UWL)(in Nation, 1990) is a list of academic 
vocabulary composed of about 800 words. It's designed for students who plan 
to study in an English-language college or university. Essentially, it's the 
most common 800 words in academic texts, excluding the 2,000 words of the 
GSL. This list is structurally linked to the GSL. A student who studies the 
GSL, followed by the UWL, will find no repetition of words. The 
list is divided into 1 1 parts. Part one has the greatest frequency and 
range, part 2 next, etc. This list is also a component of the Vocabprofile 
program. 

The Brown Corpus 

The Brown Corpus (Francis and Kucera, 1982) is the earliest computerized study 
of English vocabulary. It is an analysis of 1 million words published in the 
United States in 1961. It's also kind of old, but it's more consistent in it's 
definition of "word" (as a lemma) than the earlier lists. The 1982 publication, 
which includes both alphabetical and frequency order lists of the words, is a 
very useful resource. 

The LOB Corpus 

The LOB Corpus (Holland and Johansson, 1982) is a study of 1 million words 
of British text published in 1961. It was designed to be a British 
counterpart to the Brown corpus. 

The Cambridge English Lexicon 

The Cambridge Enghsh Lexicon (CEL) (Hindmarsh, 1980) is a list of 4470 
words, prepared with reference to the GSL, Thorndike and Lorge, Brown, other 
sources, and the author’s experience as an ESL teacher and material 
developer. Each item is graded from 1 to 5. The most useful aspect of the 
list is that the different meanings of the words are also graded on the same 
scale. Only the CEL and the GSL give separate information on the different 
meanings of common words (though, of course, dictionaries do also). The GSL 
gives actual frequency numbers for the different meanings, but the age of 
the data and the fact that it was gathered by hand may make the CEL a 
more reliable source for an indication of the relative importance to students of 
different meanings of words. The grading in the CEL is not based solely on 
frequency. 
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Modern Corpora 

These days, much is heard about corpora from dictionary 
publishers, who all boast about the enormous corpora that their learner 
dictionaries are based on. The British publishers are particularly 
enthusiastic about this, using either the CoBuild corpus or the British 
National Corpus (BNC) as a source of lexicographic information. Both of 
these corpora contain more than 100 million words. Limited access to them is 
possible through the internet, see the links on the Collocations Homepage 
listed below. Depending on your purpose, it may be more useful to access 
these corpora in pre-digested form through the dictionaries based on them. A 
lemmatized frequency list of the BNC has been prepared by Adam Kilgarriff 
and is available for FTP. 

Vocabprofile 

Vocabprofile is a freeware program for PCs that will compare a given text 
with any properly formatted list. Three lists can be done at a time. The 
output will report what percent of the words in the text are on each of the 
lists. It will also print the text with the words marked to indicate which 
list they are on, or if they aren't on a list. Vocabprofile is available for 
FTP at the URL below. The three lists that come with the program are the 
first 1,000 words of the GSL, the second 1,000 words of the GSL and thq UWL. 

Concluding Remarks 

None of these resources is ideal. Thorndike and Lorge and the GSL are old, old 
enough that the English of today surely differs significantly. However, the 
core vocabulary of English changes more slowly, so at the frequency level of 
the first 2,000 words this may be less of a problem. The GSL offers some 
advantages as a standard. It was specifically designed as a teaching vocabulary 
list. It has a long history of use, both in teaching materials and in second 
language acquisition research. A program to compare it with a given text is 
readily available. Of the lists above, only the CEL was also compiled for the 
purpose of facilitating the creation of teaching materials. It's more modem 
than the GSL, but appears to have had less impact. It is not conveniently 
available for computerized text comparison. 

The Brown Corpus, the LOB Corpus and the lemmatized list from the BNC are 
useful because they give the lists in frequency order. This allows a 
population of words to be defined much more precisely, and individual words 
to be compared with each other. But these lists were prepared for linguistic 
research, not teachers. They’re lists of lemmas, which means that words are 
listed more than once if they can act as more than one part of speech. Some 
derived forms are also considered as separate lemmas, such as comparative 
and superlative forms of adjectives. These factors affect both the frequency 
rankings of words and the number of words that appear on a list. In other 
words, a list of 1,000 words taken from the GSL or CEL would con tain more 
than 1,000 lemmas. These corpus-based lists need substantial adjustment to 
make them appropriate as vocabulary standards. These adjustments have 
already been made to the GSL and CEL. 
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An author of EFL material has many vocabulary options available. I hope this 
discussion of resources is useful and that the bibliography and the internet 
sites below will be helpful in finding the items that will serve your 
specific needs. 

Word Wide Web URLs 

Adam Kilgarriff 

http:/ / www.itri.brighton.ac.uk/ ~Adam. Kilgarriff/ 

Links to his lemmatized, frequency order version of the BNC are here. 

John Higgins 

http://www.stir.ac.uk/ epd/ celt/ staff/higdox/listers.htm 

Here you can find Vocabprofile as well as links to other programs. 

Collocations Homepage 

http://www.ed.uiuc.edu/students/jc-lai/Fall95/ 

Jennifer Lai has collected links to corpora and other lexical resources 
here. 

John Bauman 

http://plaza3.mbn.or.jp/~bauman 

This article, the UWL, the GSL, and some other resources are here. 
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